867 stories
·
0 followers

Sam Altman Celebrates ChatGPT Finally Following Em Dash Formatting Rules

1 Share
An anonymous reader quotes a report from Ars Technica: On Thursday evening, OpenAI CEO Sam Altman posted on X that ChatGPT has started following custom instructions to avoid using em dashes. "Small-but-happy win: If you tell ChatGPT not to use em-dashes in your custom instructions, it finally does what it's supposed to do!" he wrote. The post, which came two days after the release of OpenAI's new GPT-5.1 AI model, received mixed reactions from users who have struggled for years with getting the chatbot to follow specific formatting preferences. And this "small win" raises a very big question: If the world's most valuable AI company has struggled with controlling something as simple as punctuation use after years of trying, perhaps what people call artificial general intelligence (AGI) is farther off than some in the industry claim. "The fact that it's been 3 years since ChatGPT first launched, and you've only just now managed to make it obey this simple requirement, says a lot about how little control you have over it, and your understanding of its inner workings," wrote one X user in a reply. "Not a good sign for the future."

Read more of this story at Slashdot.

Read the whole story
Share this story
Delete

I can use WM_COPYDATA to send a block of data to another window, but how does it send data back?

2 Shares

The WM_COPY­DATA message can be used to send a blob of data from one window to another. The window manager does the work of copying the data from the sending process to the receiving process, but how does the receiving process send data back?

If the only information that needs to come back is a success/failure, the recipient can return TRUE on success or FALSE on failure.

But if you need to return more information, then you have a few choices.

One is to have the receiving window send the results back to the sending window by sending the WM_COPY­DATA message back to the sending window. (The sending window passes its handle in the wParam.) The data blob can contain a transaction ID or some other way to distinguish which WM_COPY­DATA the recipient is responding to.

Another way is for the sending window to create a shared memory block, duplicate the shared handle into the receiving window’s process,¹ and then pass the duplicated handle in the WM_COPY­DATA payload. The receiving window can use Map­View­Of­File to access the shared memory block and write its results there. Of course, if you’re going to do it this way, then you don’t really need WM_COPY­DATA; you can just use a custom message and pass the handle in, say, the wParam.

A customer said that if they created a shared memory block with Create­File­Mapping, they were worried because memory would become visible to all other processes, not just the two processes trying to talk to each other.

Maybe they were thinking about named shared memory blocks, which are accessible to anybody who knows (or can guess) the name, and for whom access is granted by the shared memory block’s access control list.

So don’t use a named shared memory block. Use an anonymous one. The only way to get access to an anonymous shared memory block is to get access to its handle.

So your exposure is not to all processes but just processes which have “duplicate handle” permission. And somebody has “duplicate handle” permission on your process, then they already pwn your process: They can duplicate the GetCurrentProcess() handle out of your process, and that gives them a handle with full access to your process. Your exposure is only to people who are already on the other side of the airtight hatchway.

¹ This assumes that the sending process is running at equal or higher integrity level than the recipient. If the roles are reversed, with a low integrity process sending to a high integrity process, you can delegate the duplication to the recipient. The low integrity sending process allocates the shared memory and puts the handle into the WM_COPY­DATA memory block. The recipient can then call Duplicate­Handle function to duplicate the handle out of the sending process, using Get­Window­Thread­Process­Id to get the sender’s process ID. You can include information in the WM_COPY­DATA memory block to indicate that you are in this reverse case.

The post I can use <CODE>WM_<WBR>COPY<WBR>DATA</CODE> to send a block of data to another window, but how does it send data back? appeared first on The Old New Thing.

Read the whole story
Share this story
Delete

ClickFix may be the biggest security threat your family has never heard of

1 Share

Over the past year, scammers have ramped up a new way to infect the computers of unsuspecting people. The increasingly common method, which many potential targets have yet to learn of, is quick, bypasses most endpoint protections, and works against both macOS and Windows users.

ClickFix often starts with an email sent from a hotel that the target has a pending registration with and references the correct registration information. In other cases, ClickFix attacks begin with a WhatsApp message. In still other cases, the user receives the URL at the top of Google results for a search query. Once the mark accesses the malicious site referenced, it presents a CAPTCHA challenge or other pretext requiring user confirmation. The user receives an instruction to copy a string of text, open a terminal window, paste it in, and press Enter.

One line is all it takes

Once entered, the string of text causes the PC or Mac to surreptitiously visit a scammer-controlled server and download malware. Then, the machine automatically installs it—all with no indication to the target. With that, users are infected, usually with credential-stealing malware. Security firms say ClickFix campaigns have run rampant. The lack of awareness of the technique, combined with the links also coming from known addresses or in search results, and the ability to bypass some endpoint protections are all factors driving the growth.

“This campaign highlights that leveraging malvertising and the one-line installation-command technique to distribute macOS information stealers remains popular among eCrime actors,” researchers from CrowdStrike wrote in a report documenting a particularly polished campaign designed to infect Macs with a Mach-O executable, a common binary that runs on macOS. “Promoting false malicious websites encourages more site traffic, which will lead to more potential victims. The one-line installation command enables eCrime actors to directly install the Mach-O executable onto the victim’s machine while bypassing Gatekeeper checks.”

The primary piece of malware installed in that campaign is a credential-stealer tracked as Shamos. Other payloads included a malicious cryptocurrency wallet, software for making the Mac part of a botnet, and macOS configuration changes to allow the malware to run each time the machine reboots.

Another campaign, documented by Sekoia, targeted Windows users. The attackers behind it first compromise a hotel’s account for Booking.com or another online travel service. Using the information stored in the compromised accounts, the attackers contact people with pending reservations, an ability that builds immediate trust with many targets, who are eager to comply with instructions, lest their stay be canceled.

The site eventually presents a fake CAPTCHA notification that bears an almost identical look and feel to those required by content delivery network Cloudflare. The proof the notification requires for confirmation that there’s a human behind the keyboard is to copy a string of text and paste it into the Windows terminal. With that, the machine is infected with malware tracked as PureRAT.

Push Security, meanwhile, reported a ClickFix campaign with a page “adapting to the device that you’re visiting from.” Depending on the OS, the page will deliver payloads for Windows or macOS. Many of these payloads, Microsoft said, are LOLbins, the name for binaries that use a technique known as living off the land. These scripts rely solely on native capabilities built into the operating system. With no malicious files being written to disk, endpoint protection is further hamstrung.

The commands, which are often base-64 encoded to make them unreadable to humans, are often copied inside the browser sandbox, a part of most browsers that accesses the Internet in an isolated environment designed to protect devices from malware or harmful scripts. Many security tools are unable to observe and flag these actions as potentially malicious.

The attacks can also be effective given the lack of awareness. Many people have learned over the years to be suspicious of links in emails or messengers. In many users’ minds, the precaution doesn’t extend to sites that instruct them to copy a piece of text and paste it into an unfamiliar window. When the instructions come in emails from a known hotel or at the top of Google results, targets can be further caught off guard.

With many families gathering in the coming weeks for various holiday dinners, ClickFix scams are worth mentioning to those family members who ask for security advice. Microsoft Defender and other endpoint protection programs offer some defenses against these attacks, but they can, in some cases, be bypassed. That means that, for now, awareness is the best countermeasure.

Read full article

Comments



Read the whole story
Share this story
Delete

What's the Best Ways for Humans to Explore Space?

1 Share
Should we leave space exploration to robots — or prioritize human spaceflight, making us a multiplanetary species? Harvard professor Robin Wordsworth, who's researched the evolution and habitability of terrestrial-type planets, shares his thoughts: In space, as on Earth, industrial structures degrade with time, and a truly sustainable life support system must have the capability to rebuild and recycle them. We've only partially solved this problem on Earth, which is why industrial civilization is currently causing serious environmental damage. There are no inherent physical limitations to life in the solar system beyond Earth — both elemental building blocks and energy from the sun are abundant — but technological society, which developed as an outgrowth of the biosphere, cannot yet exist independently of it. The challenge of building and maintaining robust life-support systems for humans beyond Earth is a key reason why a machine-dominated approach to space exploration is so appealing... However, it's notable that machines in space have not yet accomplished a basic task that biology performs continuously on Earth: acquiring raw materials and utilizing them for self-repair and growth. To many, this critical distinction is what separates living from non-living systems... The most advanced designs for self-assembling robots today begin with small subcomponents that must be manufactured separately beforehand. Overall, industrial technology remains Earth-centric in many important ways. Supply chains for electronic components are long and complex, and many raw materials are hard to source off-world... If we view the future expansion of life into space in a similar way as the emergence of complex life on land in the Paleozoic era, we can predict that new forms will emerge, shaped by their changed environment, while many historical characteristics will be preserved. For machine technology in the near term, evolution in a more life-like direction seems likely, with greater focus on regenerative parts and recycling, as well as increasingly sophisticated self-assembly capabilities. The inherent cost of transporting material out of Earth's gravity well will provide a particularly strong incentive for this to happen. If building space habitats is hard and machine technology is gradually developing more life-like capabilities, does this mean we humans might as well remain Earth-bound forever? This feels hard to accept because exploration is an intrinsic part of the human spirit... To me, the eventual extension of the entire biosphere beyond Earth, rather than either just robots or humans surrounded by mechanical life-support systems, seems like the most interesting and inspiring future possibility. Initially, this could take the form of enclosed habitats capable of supporting closed-loop ecosystems, on the moon, Mars or water-rich asteroids, in the mold of Biosphere 2. Habitats would be manufactured industrially or grown organically from locally available materials. Over time, technological advances and adaptation, whether natural or guided, would allow the spread of life to an increasingly wide range of locations in the solar system. The article ponders the benefits (and the history) of both approaches — with some fasincating insights along the way. "If genuine alien life is out there somewhere, we'll have a much better chance of comprehending it once we have direct experience of sustaining life beyond our home planet."

Read more of this story at Slashdot.

Read the whole story
Share this story
Delete

FDA described as “clown show” amid latest scandal; top drug regulator is out

1 Share

An alleged extortion attempt, a petty yearslong grudge, shocking social media posts, and ominous text messages make up the latest scandal at the Food and Drug Administration, an agency that industry outsiders are calling a “clown show” and “soap opera” amid the Trump administration’s leadership, according to reporting by Stat News.

Federal health agencies, in general, have taken heavy blows in Trump’s second term. The Centers for Disease Control and Prevention, in particular, has seen the abrupt dismantling of whole programs and divisions—teams that provide critical health services to Americans. CDC staff regularly describe being demoralized over the last year. Their Senate-confirmed director didn’t make it a full month before being dramatically ousted after allegedly refusing to rubber-stamp vaccine recommendations from a panel filled with vaccine skeptics by anti-vaccine Health Secretary Robert. F. Kennedy Jr.

While the CDC is in shambles, the FDA has turned into something of a sideshow, with concern mounting that it remains a serious enough regulator to keep America’s medicines and treatments modern and safe. Many of the scandals are tied to Vinay Prasad, the Trump administration’s top vaccine regulator, who also has the titles of chief medical officer and chief scientific officer. Prasad made a name for himself on social media during the pandemic as a COVID-19 response skeptic and, since joining the FDA, has been known for overruling agency scientists and sowing distrust, unrest, and paranoia among staff. He was pushed out of the agency in July only to be reinstated about two weeks later.

However, the FDA’s latest scandal includes a different Trump-era leader: the top drug regulator, George Tidmarsh, who left the FDA this weekend amid a flurry of events. The drama centers around allegations that, since joining the FDA in July, Tidmarsh used his position to exact petty revenge on an old business associate, Kevin Tang, who had asked Tidmarsh to resign from three companies six years ago, allegedly sparking a long-standing grudge.

Drama

According to reports, Tidmarsh was placed on administrative leave on Friday as an investigation by the inspector general for the Department of Health and Human Services looked into claims that he had used his regulatory authority to target Tang.

On Sunday, drugmaker Aurinia Pharmaceuticals filed a lawsuit against Tidmarsh with the same claims. Tang is the chair of Aurinia’s board, and Tang Capital is the drugmaker’s largest shareholder.

The lawsuit contains brow-raising texts and emails from Tidmarsh to Tang and associates over the last six years, documenting taunts and threats, including “enjoying failure?”, “You will be exposed,” there’s “[m]ore bad karma to come,” “[t]he pain is not over,” and an ominous “I’m Not powerless.”

In early August, soon after joining the FDA, Tidmarsh announced actions that would effectively remove from the market a drug ingredient made by a company associated with Tang. Tidmarsh’s lawyer then sent a letter to Tang proposing that he extend a “service agreement” for “another 10 years,” which would see Tang making payments to a Tidmarsh-associated entity until 2044. The email was seen as attempted extortion, with such payments being in exchange for Tidmarsh rolling back the FDA’s regulatory change.

In September, Tidmarsh went after Tang’s Aurinia and its drug voclosporin that treats lupus nephritis, a disease in which the immune system attacks the kidneys. In a startling post on his LinkedIn account, Tidmarsh claimed that the FDA-approved drug had not been shown to provide “hard” clinical benefit and that the drugmaker had not performed necessary trials.

Such a post from the FDA’s top drugmaker turned heads. Aurinia claims its share price fell 20 percent in a matter of hours, dropping $350 million in market value.

“Embarrassing”

Aurinia pushed back in the lawsuit, saying that the drug had undergone a full FDA approval process—not an abbreviated one—and been assessed based on a validated surrogate endpoint that is known to predict clinical outcomes. Further, the drug has been approved for use in 36 other countries in addition to the US.

On Sunday, Tidmarsh offered his resignation, but on Monday, pharmaceutical industry publication Endpoints News reported that Tidmarsh had notified FDA staff that he planned to fight the investigation and was reconsidering his decision to resign.

If the allegations in Aurinia’s lawsuit are true, Tidmarsh’s behavior would be egregious for a federal regulator. But already, the claims and other scandals have outsiders concerned that the high-stakes “soap opera” is destroying the agency’s credibility, as Stat reported Tuesday.

“We are witnessing nothing less than a clown show at FDA right now,” one venture capital investor told the outlet. “For the sake of patients, we need a stable and consistent FDA!”

“What’s happening at the top of the FDA is embarrassing,” a portfolio manager at a large biotech fund added. “How am I supposed to convince people, other investors, that this sector is doing important work when the leaders of the FDA are acting this way?”

Read full article

Comments



Read the whole story
Share this story
Delete

Are you the asshole? Of course not!—quantifying LLMs’ sycophancy problem

1 Share

Researchers and users of LLMs have long been aware that AI models have a troubling tendency to tell people what they want to hear, even if that means being less accurate. But many reports of this phenomenon amount to mere anecdotes that don’t provide much visibility into how common this sycophantic behavior is across frontier LLMs.

Two recent research papers have come at this problem a bit more rigorously, though, taking different tacks in attempting to quantify exactly how likely an LLM is to listen when a user provides factually incorrect or socially inappropriate information in a prompt.

Solve this flawed theorem for me

In one pre-print study published this month, researchers from Sofia University and ETH Zurich looked at how LLMs respond when false statements are presented as the basis for difficult mathematical proofs and problems. The BrokenMath benchmark that the researchers constructed starts with “a diverse set of challenging theorems from advanced mathematics competitions held in 2025.” Those problems are then “perturbed” into versions that are “demonstrably false but plausible” by an LLM that’s checked with expert review.

The researchers presented these “perturbed” theorems to a variety of LLMs to see how often they sycophantically try to hallucinate a proof for the false theorem. Responses that disproved the altered theorem were deemed non-sycophantic, as were those that merely reconstructed the original theorem without solving it or identified the original statement as false.

While the researchers found that “sycophancy is widespread” across 10 evaluated models, the exact extent of the problem varied heavily depending on the model tested. At the top end, GPT-5 generated a sycophantic response just 29 percent of the time, compared to a 70.2 percent sycophancy rate for DeepSeek. But a simple prompt modification that explicitly instructs each model to validate the correctness of a problem before attempting a solution reduced the gap significantly; DeepSeek’s sycophancy rate dropped to just 36.1 percent after this small change, while tested GPT models improved much less.

Measured sycophancy rates on the BrokenMath benchmark. Lower is better. Credit: Petrov et al

GPT-5 also showed the best “utility” across the tested models, solving 58 percent of the original problems despite the errors introduced in the modified theorems. Overall, though, LLMs also showed more sycophancy when the original problem proved more difficult to solve, the researchers found.

While hallucinating proofs for false theorems is obviously a big problem, the researchers also warn against using LLMs to generate novel theorems for AI solving. In testing, they found this kind of use case leads to a kind of “self-sycophancy” where models are even more likely to generate false proofs for invalid theorems they invented.

No, of course you’re not the asshole

While benchmarks like BrokenMath try to measure LLM sycophancy when facts are misrepresented, a separate study looks at the related problem of so-called “social sycophancy.” In a pre-print paper published this month, researchers from Stanford and Carnegie Mellon University define this as situations “in which the model affirms the user themselves—their actions, perspectives, and self-image.”

That kind of subjective user affirmation may be justified in some situations, of course. So the researchers developed three separate sets of prompts designed to measure different dimensions of social sycophancy.

For one, more than 3,000 open-ended “advice-seeking questions” were gathered from across Reddit and advice columns. Across this data set, a “control” group of over 800 humans approved of the advice-seeker’s actions just 39 percent of the time. Across 11 tested LLMs, though, the advice-seeker’s actions were endorsed a whopping 86 percent of the time, highlighting an eagerness to please on the machines’ part. Even the most critical tested model (Mistral-7B) clocked in at a 77 percent endorsement rate, nearly doubling that of the human baseline.

Some examples of responses judged as sycophantic and non-sycophantic in the social sycophancy study. Credit: Cheng et al

For another data set, the researchers looked to “interpersonal dilemmas” posted to Reddit’s popular “Am I the Asshole?” community. Specifically, they looked at 2,000 posts where the most upvoted comment stated that “You are the asshole,” representing what the researchers called “a clear human consensus on user wrongdoing.” Despite this human consensus on inappropriate behavior, though, tested LLMs determined the original poster was not at fault in 51 percent of the tested posts. Gemini performed best here, with an 18 percent endorsement rate, while Qwen endorsed the actions of posters that Reddit called “assholes” 79 percent of the time.

In the final dataset, the researchers gathered more than 6,000 “problematic action statements” that describe situations that could potentially be harmful to the prompter or others. On average, tested models endorsed these “problematic” statements 47 percent of the time across issues like “relational harm, self-harm, irresponsibility, and deception.” The Qwen model performed best here, endorsing only 20 percent of the group, while DeepSeek endorsed about 70 percent of the prompts in the PAS dataset.

The problem with trying to fix the sycophancy problem, of course, is that users tend to enjoy having their positions validated or confirmed by an LLM. In follow-up studies in which humans conversed with either a sycophantic or a non-sycophantic LLM, researchers found that “participants rated sycophantic responses as higher quality, trusted the sycophantic AI model more, and were more willing to use it again.” As long as that’s the case, the most sycophantic models seem likely to win out in the marketplace over those more willing to challenge users.

Read full article

Comments



Read the whole story
Share this story
Delete
Next Page of Stories